AITopics | output representation

Collaborating Authors

output representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model

Haruta, Shuichiro, Matsumoto, Kazunori, Li, Zhi, Wang, Yanan, Kurokawa, Mori

arXiv.org Artificial IntelligenceOct-10-2025

In this paper, we propose a rotation-constrained compensation method to address the errors introduced by structured pruning of large language models (LLMs). LLMs are trained on massive datasets and accumulate rich semantic knowledge in their representation space. In contrast, pruning is typically carried out with only a small amount of calibration data, which makes output mismatches unavoidable. Although direct least-squares fitting can reduce such errors, it tends to overfit to the limited calibration set, destructively modifying pretrained weights. To overcome this difficulty, we update the pruned parameters under a rotation constraint. This constrained update preserves the geometry of output representations (i.e., norms and inner products) and simultaneously re-aligns the pruned subspace with the original outputs. Furthermore, in rotation-constrained compensation, removing components that strongly contribute to the principal directions of the output makes error recovery difficult. Since input dimensions with large variance strongly affect these principal directions, we design a variance-aware importance score that ensures such dimensions are preferentially kept in the pruned model. By combining this scoring rule with rotation-constrained updates, the proposed method effectively compensates errors while retaining the components likely to be more important in a geometry-preserving manner. In the experiments, we apply the proposed method to LLaMA-7B and evaluate it on WikiText-2 and multiple language understanding benchmarks. The results demonstrate consistently better perplexity and task accuracy compared with existing baselines.

large language model, natural language, pruning, (18 more...)

arXiv.org Artificial Intelligence

2510.07782

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Japan > Honshū > Kantō > Saitama Prefecture > Saitama (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Appendix Organization

Neural Information Processing SystemsAug-16-2025, 22:40:24 GMT

Then, anyone will be able to freely retrieve/update any models from our model-zoo.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Oregon (0.04)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Information Technology (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs

Ren, Yanwei, Liu, Liu, Yu, Baosheng, Qiu, Jiayan, Chen, Quan

arXiv.org Artificial IntelligenceJun-30-2025

Optimizing instructions for large language models (LLMs) is critical for harnessing their full potential in complex and diverse tasks. However, relying solely on white-box approaches demands extensive computational resources and offers limited representational capacity, while black-box models can incur prohibitive financial costs. To address these challenges, we introduce a novel framework that seamlessly merges the strengths of both paradigms. Black-box models provide high-quality, diverse instruction initializations, and white-box models supply fine-grained interpretability through hidden states and output features. By enforcing a semantic similarity constraint, these components fuse into a unified high-dimensional representation that captures deep semantic and structural nuances, enabling an iterative optimization process to refine instruction quality and adaptability. Extensive evaluations across a broad spectrum of tasks-ranging from complex reasoning to cross-lingual generalization-demonstrate that our approach consistently outperforms state-of-the-art baselines. This fusion of black-box initialization with advanced semantic refinement yields a scalable and efficient solution, paving the way for next-generation LLM-driven applications in diverse real-world scenarios. The source code will be released soon.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.21573

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting

Zhao, Tianya, Wang, Ningning, Zhang, Junqing, Wang, Xuyu

arXiv.org Artificial IntelligenceMay-5-2025

While supervised deep neural networks (DNNs) have proven effective for device authentication via radio frequency (RF) fingerprinting, they are hindered by domain shift issues and the scarcity of labeled data. The success of large language models has led to increased interest in unsupervised pre-trained models (PTMs), which offer better generalization and do not require labeled datasets, potentially addressing the issues mentioned above. However, the inherent vulnerabilities of PTMs in RF fingerprinting remain insufficiently explored. In this paper, we thoroughly investigate data-free backdoor attacks on such PTMs in RF fingerprinting, focusing on a practical scenario where attackers lack access to downstream data, label information, and training processes. To realize the backdoor attack, we carefully design a set of triggers and predefined output representations (PORs) for the PTMs. By mapping triggers and PORs through backdoor training, we can implant backdoor behaviors into the PTMs, thereby introducing vulnerabilities across different downstream RF fingerprinting tasks without requiring prior knowledge. Extensive experiments demonstrate the wide applicability of our proposed attack to various input domains, protocols, and PTMs. Furthermore, we explore potential detection and defense methods, demonstrating the difficulty of fully safeguarding against our proposed backdoor attack.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.00881

Country:

Europe > United Kingdom (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding

Hu, Dou, Wei, Lingwei, Zhou, Wei, Hu, Songlin

arXiv.org Artificial IntelligenceMar-6-2025

This paper proposes a new principled multi-task representation learning framework (InfoMTL) to extract noise-invariant sufficient representations for all tasks. It ensures sufficiency of shared representations for all tasks and mitigates the negative effect of redundant features, which can enhance language understanding of pre-trained language models (PLMs) under the multi-task paradigm. Firstly, a shared information maximization principle is proposed to learn more sufficient shared representations for all target tasks. It can avoid the insufficiency issue arising from representation compression in the multi-task paradigm. Secondly, a task-specific information minimization principle is designed to mitigate the negative effect of potential redundant features in the input for each task. It can compress task-irrelevant redundant information and preserve necessary information relevant to the target for multi-task prediction. Experiments on six classification benchmarks show that our method outperforms 12 comparative multi-task methods under the same multi-task settings, especially in data-constrained and noisy scenarios. Extensive experiments demonstrate that the learned representations are more sufficient, data-efficient, and robust.

infomtl, information, representation, (15 more...)

arXiv.org Artificial Intelligence

2503.04667

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Review for NeurIPS paper: Towards Deeper Graph Neural Networks with Differentiable Group Normalization

Neural Information Processing SystemsJan-23-2025, 07:49:31 GMT

Weaknesses: (1) Empirical results seem to be weak compared to other works [1] aiming at tackling over-smoothing problem. According to table 1, Deep GNNs with DGN outperform those with other normalization mechanisms. However, the performance degradation still exists when the GNNs are made deeper. Though the idea is somewhat incremental, the proposed Differentiable Group Normalization relates it indeed. However, the Instance Information Gain employ mutual information between the input features and output representations as a metric, which seems to be somewhat weird. According to the Appendix F, the output representation is taken from the final prediction layer, which is the result of a linear transformation applied to the top hidden features.

deeper graph neural network, differentiable group normalization, neurips paper, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Joint Perception and Prediction for Autonomous Driving: A Survey

Dal'Col, Lucas, Oliveira, Miguel, Santos, Vítor

arXiv.org Artificial IntelligenceDec-18-2024

Perception and prediction modules are critical components of autonomous driving systems, enabling vehicles to navigate safely through complex environments. The perception module is responsible for perceiving the environment, including static and dynamic objects, while the prediction module is responsible for predicting the future behavior of these objects. These modules are typically divided into three tasks: object detection, object tracking, and motion prediction. Traditionally, these tasks are developed and optimized independently, with outputs passed sequentially from one to the next. However, this approach has significant limitations: computational resources are not shared across tasks, the lack of joint optimization can amplify errors as they propagate throughout the pipeline, and uncertainty is rarely propagated between modules, resulting in significant information loss. To address these challenges, the joint perception and prediction paradigm has emerged, integrating perception and prediction into a unified model through multi-task learning. This strategy not only overcomes the limitations of previous methods, but also enables the three tasks to have direct access to raw sensor data, allowing richer and more nuanced environmental interpretations. This paper presents the first comprehensive survey of joint perception and prediction for autonomous driving. We propose a taxonomy that categorizes approaches based on input representation, scene context modeling, and output representation, highlighting their contributions and limitations. Additionally, we present a qualitative analysis and quantitative comparison of existing methods. Finally, we discuss future research directions based on identified gaps in the state-of-the-art.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2412.14088

Country:

Europe > Portugal > Aveiro > Aveiro (0.05)
Europe > Switzerland (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Generalized Probabilistic Attention Mechanism in Transformers

Heo, DongNyeong, Choi, Heeyoul

arXiv.org Artificial IntelligenceOct-20-2024

The Transformer architecture has become widely adopted due to its demonstrated success, attributed to the attention mechanism at its core. Despite these successes, the attention mechanism of Transformers is associated with two well-known issues: rank-collapse and gradient vanishing. In this paper, we present a theoretical analysis that it is inherently difficult to address both issues simultaneously in the conventional attention mechanism. To handle these issues, we introduce a novel class of attention mechanism, referred to as generalized probabilistic attention mechanism (GPAM), and its dual-attention implementation within the Transformer architecture. Unlike conventional attention mechanisms, GPAM allows for negative attention scores while preserving a fixed total sum. We provide theoretical evidence that the proposed dual-attention GPAM (daGPAM) effectively mitigates both the rank-collapse and gradient vanishing issues which are difficult to resolve simultaneously with the conventional attention mechanisms. Furthermore, we empirically validate this theoretical evidence, demonstrating the superiority of daGPAM compared to other alternative attention mechanisms that were proposed to address the same issues. Additionally, we demonstrate the practical benefits of GPAM in natural language processing tasks, such as language modeling and neural machine translation. The Transformer model, as introduced by (Vaswani, 2017), has emerged as a pivotal architecture driving the advancement of contemporary deep learning models across various domains, including natural language processing (Brown et al., 2020), audio signal processing (Gulati et al., 2020), and image processing (Dosovitskiy et al., 2021). Central to the Transformer's success is the attention mechanism, which facilitates the contextualization of input token representations.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15578

Country:

Europe > Belgium (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Collaborating Authors

output representation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

d611019afba70d547bd595e8a4158f55-Supplemental-Conference.pdf

RCPU: Rotation-Constrained Error Compensation for Structured Pruning of a Large Language Model

d611019afba70d547bd595e8a4158f55-Supplemental-Conference.pdf

Appendix Organization

Instruction Learning Paradigms: A Dual Perspective on White-box and Black-box LLMs

Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting

An Information-theoretic Multi-task Representation Learning Framework for Natural Language Understanding

Review for NeurIPS paper: Towards Deeper Graph Neural Networks with Differentiable Group Normalization

Joint Perception and Prediction for Autonomous Driving: A Survey

Generalized Probabilistic Attention Mechanism in Transformers